智能论文笔记

Distribution-based Sketching of Single-Cell Samples

Vishal Athreya Baskaran , Jolene Ranek , Siyuan Shan , Natalie Stanley , Junier B. Oliva

分类：机器学习

2022-06-30

现代的高通量单细胞免疫分析技术，例如流量，质量细胞术和单细胞RNA测序，可以轻松地测量多种患者队列中数百万个细胞中大量蛋白质或基因特征的表达。虽然生物信息学方法可用于将免疫细胞异质性与感兴趣的外部变量（例如临床结果或实验标签）联系起来，但它们通常很难适应如此大量的概要细胞。为了减轻这种计算负担，通常有限的单元格是\ emph {sherped}或从每个患者中进行了采样。但是，现有的草图方法无法从稀有细胞群中充分分类稀有细胞，或者无法保留特定免疫细胞类型的真实频率。在这里，我们提出了一种基于内核牛群的新颖素描方法，该方法选择了所有细胞的有限子样本，同时保留了免疫细胞类型的潜在频率。我们在三个流量和质量细胞仪数据集以及一个单细胞RNA测序数据集上测试了方法，并证明了素描的单元格（1）更准确地表示整体蜂窝景观，（2）促进下游分析任务的性能提高，例如根据患者的临床结果对患者进行分类。 \ url {https://github.com/vishalathreya/set-summarization}公开获得用内核放牧的素描实现。

translated by 谷歌翻译

Transparent Single-Cell Set Classification with Kernel Mean Embeddings

Siyuan Shan , Vishal Baskaran , Haidong Yi , Jolene Ranek , Natalie Stanley , Junier Oliva

分类：机器学习

2022-01-18

现代单细胞流量和质量细胞仪技术测量血液或组织样品中单个细胞的几种蛋白质的表达。因此，每个分析的生物样品都由数十万个多维细胞特征向量表示，这会产生高计算成本，以预测每个生物样品与机器学习模型的相关表型。如此大的固定基础性也限制了机器学习模型的可解释性，因为难以跟踪每个单个单个细胞如何影响最终预测。我们建议使用内核平均嵌入来编码每个分类生物样品的细胞景观。尽管我们最重要的目标是制作一个更透明的模型，但我们发现我们的方法与通过简单的线性分类器相比，您的方法获得了可比性或更好的精度。结果，我们的模型包含很少的参数，但仍与具有数百万参数的深度学习模型相似。与深度学习方法相反，我们模型的线性和子选择步骤使解释分类结果变得容易。分析进一步表明，我们的方法可以接受丰富的生物学解释性，以将细胞异质性与临床表型联系起来。

translated by 谷歌翻译

Heterogeneous Domain Adaptation and Equipment Matching: DANN-based Alignment with Cyclic Supervision (DBACS)

Natalie Gentner , Gian Antonio Susto

分类：机器学习

2023-01-03

Process monitoring and control are essential in modern industries for ensuring high quality standards and optimizing production performance. These technologies have a long history of application in production and have had numerous positive impacts, but also hold great potential when integrated with Industry 4.0 and advanced machine learning, particularly deep learning, solutions. However, in order to implement these solutions in production and enable widespread adoption, the scalability and transferability of deep learning methods have become a focus of research. While transfer learning has proven successful in many cases, particularly with computer vision and homogenous data inputs, it can be challenging to apply to heterogeneous data. Motivated by the need to transfer and standardize established processes to different, non-identical environments and by the challenge of adapting to heterogeneous data representations, this work introduces the Domain Adaptation Neural Network with Cyclic Supervision (DBACS) approach. DBACS addresses the issue of model generalization through domain adaptation, specifically for heterogeneous data, and enables the transfer and scalability of deep learning-based statistical control methods in a general manner. Additionally, the cyclic interactions between the different parts of the model enable DBACS to not only adapt to the domains, but also match them. To the best of our knowledge, DBACS is the first deep learning approach to combine adaptation and matching for heterogeneous data settings. For comparison, this work also includes subspace alignment and a multi-view learning that deals with heterogeneous representations by mapping data into correlated latent feature spaces. Finally, DBACS with its ability to adapt and match, is applied to a virtual metrology use case for an etching process run on different machine types in semiconductor manufacturing.

translated by 谷歌翻译

Cluster-level Group Representativity Fairness in $k$-means Clustering

Stanley Simoes , Deepak P , Muiris MacCarthaigh

分类：机器学习

2022-12-29

There has been much interest recently in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and gender. We observe that clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters. We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms such as $k$-means, where we focus on mitigating the unfairness experienced by the most-disadvantaged group within each cluster. Our method uses an iterative optimisation paradigm whereby an initial cluster assignment is modified by reassigning objects to clusters such that the worst-off sensitive group within each cluster is benefitted. We demonstrate the effectiveness of our method through extensive empirical evaluations over a novel evaluation metric on real-world datasets. Specifically, we show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.

translated by 谷歌翻译

From Single-Visit to Multi-Visit Image-Based Models: Single-Visit Models are Enough to Predict Obstructive Hydronephrosis

Stanley Bryan Z. Hua , Mandy Rickard , John Weaver , Alice Xiang , Daniel Alvarez , Kyla N. Velear , Kunj Sheth , Gregory E. Tasian , Armando J. Lorenzo , Anna Goldenberg

分类：计算机视觉 | 人工智能

2022-12-27

Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.

translated by 谷歌翻译

Risk assessment and mitigation of e-scooter crashes with naturalistic driving data

Avinash Prabu , Renran Tian , Stanley Chien , Lingxi Li , Yaobin Chen , Rini Sherony

分类：计算机视觉

2022-12-24

Recently, e-scooter-involved crashes have increased significantly but little information is available about the behaviors of on-road e-scooter riders. Most existing e-scooter crash research was based on retrospectively descriptive media reports, emergency room patient records, and crash reports. This paper presents a naturalistic driving study with a focus on e-scooter and vehicle encounters. The goal is to quantitatively measure the behaviors of e-scooter riders in different encounters to help facilitate crash scenario modeling, baseline behavior modeling, and the potential future development of in-vehicle mitigation algorithms. The data was collected using an instrumented vehicle and an e-scooter rider wearable system, respectively. A three-step data analysis process is developed. First, semi-automatic data labeling extracts e-scooter rider images and non-rider human images in similar environments to train an e-scooter-rider classifier. Then, a multi-step scene reconstruction pipeline generates vehicle and e-scooter trajectories in all encounters. The final step is to model e-scooter rider behaviors and e-scooter-vehicle encounter scenarios. A total of 500 vehicle to e-scooter interactions are analyzed. The variables pertaining to the same are also discussed in this paper.

translated by 谷歌翻译

A Wearable Data Collection System for Studying Micro-Level E-Scooter Behavior in Naturalistic Road Environment

Avinash Prabu , Dan Shen , Renran Tian , Stanley Chien , Lingxi Li , Yaobin Chen , Rini Sherony

分类：计算机视觉

2022-12-22

As one of the most popular micro-mobility options, e-scooters are spreading in hundreds of big cities and college towns in the US and worldwide. In the meantime, e-scooters are also posing new challenges to traffic safety. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more flexible and much faster than the pedestrains and bicyclists. These features make e-scooters challenging for human drivers, pedestrians, vehicle active safety modules, and self-driving modules to see and interact. To study this new mobility option and address e-scooter riders' and other road users' safety concerns, this paper proposes a wearable data collection system for investigating the micro-level e-Scooter motion behavior in a Naturalistic road environment. An e-Scooter-based data acquisition system has been developed by integrating LiDAR, cameras, and GPS using the robot operating system (ROS). Software frameworks are developed to support hardware interfaces, sensor operation, sensor synchronization, and data saving. The integrated system can collect data continuously for hours, meeting all the requirements including calibration accuracy and capability of collecting the vehicle and e-Scooter encountering data.

translated by 谷歌翻译

SceNDD: A Scenario-based Naturalistic Driving Dataset

Avinash Prabu , Nitya Ranjan , Lingxi Li , Renran Tian , Stanley Chien , Yaobin Chen , Rini Sherony

分类：机器人

2022-12-22

In this paper, we propose SceNDD: a scenario-based naturalistic driving dataset that is built upon data collected from an instrumented vehicle in downtown Indianapolis. The data collection was completed in 68 driving sessions with different drivers, where each session lasted about 20--40 minutes. The main goal of creating this dataset is to provide the research community with real driving scenarios that have diverse trajectories and driving behaviors. The dataset contains ego-vehicle's waypoints, velocity, yaw angle, as well as non-ego actor's waypoints, velocity, yaw angle, entry-time, and exit-time. Certain flexibility is provided to users so that actors, sensors, lanes, roads, and obstacles can be added to the existing scenarios. We used a Joint Probabilistic Data Association (JPDA) tracker to detect non-ego vehicles on the road. We present some preliminary results of the proposed dataset and a few applications associated with it. The complete dataset is expected to be released by early 2023.

translated by 谷歌翻译

The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and Results

Mark Niklas Müller , Christopher Brix , Stanley Bak , Changliu Liu , Taylor T. Johnson

分类：机器学习 | 人工智能

2022-12-20

This report summarizes the 3rd International Verification of Neural Networks Competition (VNN-COMP 2022), held as a part of the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), which was collocated with the 34th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2022 iteration, 11 teams participated on a diverse set of 12 scored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

translated by 谷歌翻译

Provable Fairness for Neural Network Models using Formal Verification

Giorgian Borca-Tasciuc , Xingzhi Guo , Stanley Bak , Steven Skiena

分类：机器学习

2022-12-16

Machine learning models are increasingly deployed for critical decision-making tasks, making it important to verify that they do not contain gender or racial biases picked up from training data. Typical approaches to achieve fairness revolve around efforts to clean or curate training data, with post-hoc statistical evaluation of the fairness of the model on evaluation data. In contrast, we propose techniques to \emph{prove} fairness using recently developed formal methods that verify properties of neural network models.Beyond the strength of guarantee implied by a formal proof, our methods have the advantage that we do not need explicit training or evaluation data (which is often proprietary) in order to analyze a given trained model. In experiments on two familiar datasets in the fairness literature (COMPAS and ADULTS), we show that through proper training, we can reduce unfairness by an average of 65.4\% at a cost of less than 1\% in AUC score.

translated by 谷歌翻译